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Abstract 

Let (j) he a 3CNF formula with n variables and m clauses. A simple 
nonconstructive argument shows that when m is sufficiently large com- 
pared to n, most 3CNF formulas are not satisfiable. It is an open question 
whether there is an efficient refutation algorithm that for most such for- 
mulas proves that they are not satisfiable. A possible approach to refute 
a formula is: first, translate it into a graph using a generic reduc- 
tion from 3-SAT to max-IS, then bound the maximum independent set of 

using the Lovasz function. If the i9 function returns a value < m, 
this is a certificate for the unsatisfiability of (p. We show that for random 
formulas with m < n'^^^~°(^^ clauses, the above approach fails, i.e. the i9 
function is likely to return a value of m. 

1 Introduction 

A 3CNF formula over n variables is a set of m clauses, where each clause 
contains exactly 3 literals. A formula (f> is satisfiable if there is an assignment 
to its n variables that sets at least one literal in every clause to " true" . The 
3-SAT problem of deciding whether an input 3CNF formula cj) is satisfiable is 
NP-hard. In this paper we consider a certain heuristic for 3-SAT. A heuristic 
for satisfiability may try to find a satisfying assignment for an input formula 4> if 
one exists. A refutation heuristic may try to prove that no satisfying assignment 
exists. 

How does one measure the quality of a refutation heuristic? A possible test 
may be to check how good the heuristic is on a random input. But then, how do 
we generate a random unsatisfiable formula? To answer this question we review 
some known properties of random 3CNF formulas. The satisfiability property 
has the following interesting threshold behavior. Let be a random 3CNF 
formula with n variables and cn clauses (each new clause is chosen independently 
and uniformly from the set of all possible clauses) . As the parameter c governing 
the density of the formula is increased, it becomes less likely that is satisfiable, 
as there are more constraints to satisfy. In ,8; it is shown that there exists c„ 
such that for c < c„(l — e) almost surely cj) is satisfiable, and for c > c„(l -I- e), 
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is almost surely unsatisfiable (for some e which tends to zero as n increases) . It 
is also known that 3.52 < c„ < 4.596 ^^^lEl and it is widely believed that 
Cn converge to some constant c. We will use random formulas with cn clauses 
(for c > c„(l + e)) to measure the performance of a refutation heuristic. Notice 
that for any n, as c is increased (for c > c„(l + e)), the algorithmic problem of 
refutation becomes less difficult since we can always ignore a fixed fraction of 
the clauses. 

In this paper we analyse a semidefinite programming based refutation algo- 
rithm which was introduced at , and show that for random formulas of certain 
densities (well above the satisfiability threshold) this algorithm fails. 

The algorithm itself is simple to describe (to readers familiar with some of 
the previous work). 

1. Given an input 3CNF formula 0, apply to it a standard reduction from 
max 3-SAT to maximum independent set, resulting in a graph G^. The 
size of the maximum independent set in is equal to the maximum 
number of clauses that can be simultaneously satisfied in (j). 

2. Compute the Lovasz function of the graph G^. This provides an upper 
bound on the size of the maximum independent set of G^. 

3. If ??(G0) < m, then output "unsatisfiable". Otherwise return "do not 
know" . 

We now describe the graph G^ in more detail. Recall that for a 3CNF 
clause, there are seven different assignments to its three literals that satisfy the 
clause. For each clause of cj) the graph contains a clique of 7 vertices, which we 
call a cloud. Hence contains 7to vertices. Each vertex of the clause cloud 
is associated with a different assignment to the three literals of the clause that 
satisfies the clause. Vertices of different clouds are connected by an edge if they 
are associated with contradicting assignments. (Namely, if there is a variable 
that is assigned to true by one of the assignments and to false by the other. For 
the same reason, the vertices within a cloud form a clique.) 

The ^ function of any graph G upper bounds the maximum independent set 
in it, and can be computed in polynomial time up to arbitrary precision, using 
semidefinite programming. The fact that the vertices of G^ can be covered by 
m cliques implies that i}{G^) < m. Thus, if (f) is satisfiable then the value of the 
theta function will be exactly to. If the value of the theta function is < to then 
(j) is unsatisfiable. 

The above algorithm has one sided error, in the sense that it will never 
say "unsatisfiable" on a satisfiable formula, but for some unsatisfiable formu- 
las it will fail to output "unsatisfiable". If for some formula (f) the algorithm 
outputs 'unsatisfiable', then the algorithm execution on is a witness for the 
unsatisfiability of (j). 

Our main result is that for random 3CNF formula with to < n^/'^~°(^) 
clauses it is very likely that 'd{G^) = to. 
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1.1 Related work 



A possible approach for refuting a formula 4> is to find a resolution proof for 
the unsatisfiability of (j). However, Chvatal and Szemeredi 0] proved that a 
resolution proof of a random 3CNF formula with linear number of clauses is 
almost surely of exponential size. A result of a similar flavor for denser formulas 
was given by Ben-Sasson and Wigderson |3] who showed that a random formula 
with v?^'^~'^ clauses almost surely requires a resolution proof of size 
These lower bounds imply that finding a resolution proof for a random formula 
is computationally inefficient. 

A simple refutation algorithm can be used to refute random instances with 
cr? clauses, when c > 2/3. This is done by selecting all the clauses that contain 
a variable x. Fixing x to be true leaves about half of the selected clauses as a 
random 2-cnf formula with roughly 3cn/2 > n clauses. This formula is unlikely 
to be satisfiable, and its nonsatisfiability can be verified by a polynomial time 
algorithm for 2SAT. The same can be done when fixing x to be false. 

A spectral approach introduced by Goerdt and Krivelevich ^10, gave a signifi- 
cant improvement and reduced the bound to (log n)^-n'' clauses for efficient refu- 
tation of 2fc-cnf formulas. This was later improved by [7j that showed how to 
efficiently refute a random 2fc-cnf instances with at least cv^ clauses. The basic 
approach for refutation of 2fc-cnf formulas was later extended in (n|,U,[Zl to 
handle also random 3CNF formulas with poly(logn) • r?!"^ ^ cr?!"^ clauses 

respectively. Our current result gives a somewhat weak indication that spectral 
methods can not break the r?l'^~°'^^^ barrier. 

Further motivation for studying efficient refutability of random 3CNF for- 
mulas is given in j^. There it is shown that if there is no polynomial time 
refutation heuristic that works for most 3CNF formulas with cn clauses (where 
c is an arbitrarily large constant) then certain combinatorial optimization prob- 
lems (like minimum graph bisection, the dense fc-subgraph, and others) have 
no polynomial time approximation schemes. It is an open question whether 
it is NP-hard to approximate these problems arbitrarily well, though further 
evidence that these problems are indeed hard to approximate is given in jl5|. 

The algorithm considered in the current paper for refuting </> by computing 
'&{G^) was presented in [^j. There is was shown that when m < n^~°^^\ almost 
surely "ffiG^) > (1 — o(l))m. Our current work overcomes a difficulty that 
prevented the approach of [Hj to show that ^{G^) — m, not even for formulas 
with a linear number of clauses. The difficulty was the existence of pairs of 
clauses that share two variables. 

Related algorithms for refuting CNF formulas were analysed in [2^. There 
the authors considered a certain linear programming relaxation of the satisfia- 
bility problem, and successive tightenings of this relaxation via the operators of 
Lovasz and Schrijver. The authors of show that in order to refute a random 
3CNF formula with cn clauses (where c is a sufficiently large constant) one has 
to apply r2(n) rounds of the Lovasz-Schrijver operator to the initial relaxation. 
Our results deal only with the Lovasz d function which lies at the lowest level 
of the Lovasz-Schrijver hierarchy (for maximum independent set relaxation). 
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In this respect, the results in pP are stronger than ours. However, we beheve 
that our results are of independent interest. (In fact, they were obtained inde- 
pendently of and roughly concurrently with the results of pp.) One superficial 
difference is that we consider denser 3 CNF formulas. This difference is only 
superficial, because also the results of extend to denser formulas, by limiting 
them to the lower levels of the Lovasz-Schrijver hierarchy. A more substantial 
difference is that the staring point of ^ , which is a linear program relaxation of 
3CNF, is different from ours. We first apply a reduction to the 3CNF formula, 
inducing a graph, and only then apply the Lovasz d function to the induced 
graph. It is not obvious (at least for us) what is the minimal i for which the i-th 
relaxation used in is stronger than the relaxation we use (such i exists since 
the n-th relaxation always returns the correct answer). And finally, there are 
differences between our proof techniques and those of . We present a solution 
to the vector formulation of the function, whereas present a solution to the 
matrix formulation of their relaxation. 

2 Results 

Instead of working with G(p we work with an induced subgraph of that is 
derived from by retaining in each clause cloud only the vertices corresponding 
to satisfying 3X0R assignments of the clause. Namely, for each clause we keep 
those four vertices that are associated with assignments that satisfy an odd 
number of literals in the clause. We call this subgraph G^^"''. Since G^"*" is 
an induced subgraph of G^ it follows (by known monotonicity properties of the 
theta function) that 'd{Gl°'') < 'diG^,). We show that when m < n^/'^-°W w.h.p. 
i?(G^°'') — m, which by the above discussion implies that also 'd{G^) ~ m. 

3 _ 22 log log n 

Theorem 2.1. Let cj) be a random 3CNF formula with m = o{n^ '"s" ) 
clauses and n variables. With high probability ^{G'^"^) = m. 

3 22 log log n 

Corollary 2.2. Let (j) be a random 3CNF formula with m — o[n^ '"s" ) 
clauses and n variables. With high probability i?(G0) = m. 

For G^°^ our results are nearly optimal in terms of the density of the under- 
lying 3CNF formula cf). 

Proposition 2.3. Let (j) be a random 3CNF formula with m, > cn^^"^ clauses 
and n variables, where c is a sufficiently large constant. With high probability 

We suspect that when m > cr?^"^ then also ^^(G^) < to, although we did not 
prove it (when m > cr?^"^ there are other refutation methods that succeed, see 
13 for details). 

For convenience, from now on we will refer to the i9(G^°'') also as SDP{(j)). 
We prove Theorem l2.1l in two steps. First we introduce a simple refutation proof 
system that we call narrow Gauss Elimination 3 (in short GE3) and prove that it 
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is stronger then SDP{(j)), i.e. if (j) cannot be refuted by GE3 then SDPi^cf) = m. 

3 22 log log n 

We then show that a random 3X0R formula with m = o{n^ '"s" ) clauses 
almost surely cannot be refuted by GE3. 

Definition 2.4. The GE3 proof system works as follows. It receives as input 
a system of linear equations modulo 2, where every equation has at most three 
literals. It succeeds in refuting the system of linear equations if it manages to 
derive the equation 0=1.^ new equation can he derived only if it contains at 
most three variables, and it is the result of adding exactly two existing equations 
and simplifying the result mod 2. By simplifying modulo 2 we mean that 1 ± 1 = 
0, Xi i Xi — and Xi ± Xi = 1, for every variable i. 

To clarify the derivation rule of GE3, consider the following three linear 
equations: xi + X2 + x^ = 1, xi + X4 + x^ — 1 and X2 + X4 + xq = 1. No 
new equation can be derived by the GE3 proof system, because adding any 
two equations produces an equation with four variables. In particular, also the 
equation X3 + x^ + xq — 1 cannot be derived, even though it contains only three 
variables and is implied by the original equations (by adding the three of them). 

Observe that if an equation ei containing only two variables is derived in 
GE3 (say, xi + X2 = 0), then in every other equation 62 we can use GE3 to 
replace the occurrence of one of the variables by the other, by adding ei and 62. 

The proof of Theorem 12. II is an immediate consequence of the following two 
lemmas. 

Lemma 2.5. Let (f> be any formula with m clauses. If (j) cannot be refuted by 
GE3 then SDP{(j)) = to. 

Lemma 2.6. Let (p be a random 3X0R formula with n variables and m = 

3 _ 22 log log n 

o[n^ '"s" ) clauses. With high probability GE3 cannot refute cj). 

3 SDP formulation of the "d function 

For each vertex i we assign a vector Vi. There is also a special vector uq. The 
semidefinite program is: 

subject to: 

(wo,wo) = 1 (1) 

{vi,Vi) = {vi^vo) (2) 

{v,,v,)>Q (3) 

{v,,v,)=Q (4) 

Instantiating the above semi-definite program for the graph G^'"' we derive 
the following semi-definite program, in which for clause i there are 4 assignment 



max^ (t;o, Vi) 

i=l 

for every i > 1: 
for every pair i, j: 
for any edge (i, j): 
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vectors vf, one for every assignment of its three variables that satisfies an odd 
number of literals in the clause. 

subject to: 

{vo,vo) = 1 (5) 

= {vj,vo) (6) 

W,4>>0 (7) 

(vUv^^O (8) 

(A pair of vectors is contradicting if there is some variable that the assignment 
associated with one of the vectors sets to true, and the assignment associated 
with the other vector assigns to false.) 

The value of the second semi-definite program is at most m because ev- 
ery clause cloud forms a clique. As the following known Lemma shows, the 
contribution of a clique to the objective function is at most 1. 

Lemma 3.1. Let vq be a unit vector and let Vi,V2,V3,V4 be orthogonal vectors, 
such that {vi,Vi) — {vi,vo) for all i. Then '^'^^i{vo,Vi) < 1. 

Proof. Since vq is a unit vector and wi,W2,f3,i'4 are orthogonal, it holds that 
J2t=i{'"o, ||f7j|-)^ < 1- It thus follows that 

4 4 4 

ti ii^^ii t^i ii^^ii 

where the last equality follows from WviW^ = {vQ,Vi). □ 

Note that Lemma [3.11 implies that for any graph G, if the vertices of G can 
be covered by p cliques, then ??(G) < p. 

4 Proofs 

We will use the SDP formulation of the i? function as appears in Section |3| 

Proof of lemma \2.5[ Apply the derivation rule of the GE3 system as long as 
new equations are generated by it. Since the number of possible equations with 
at most three variables is 0{n^), then this procedure must end. Assume that 
the equation = 1 could not be derived. Hence we are left with equations 
containing one variable (meaning that the value of this variable must be fixed 
to a constant), two variables (meaning that their values must be identical, or 
sum up to 1, depending on the free constant in the equation), or three variables. 



max {"^0 , vf ) 

i—l..m, 
j = 1..4 

for every vector: 

for every pair of vectors: 

for every pair of contradicting vectors: 
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The information that GE3 derives 
equivalence classes of the form: 



about 



(j) allows us to partition 



all literals into 



Si : a;i = Xi8 = . . . = 
S2: = X20 = ■ ■ ■ = 



{Si : xi = xis 
{S2 ■ X4 = X20 



= X9) 
= X5) 



(9) 



Sg : X2 = X21 = ■ ■ ■ = X30 



(Sg : X2=X21 = ■ 



= X30) 



Si : l=xe=xn = ...xs 



{Sr. = xe= xn 



■■Xs) 



Notice that each equivalence class Si has a "mirror" part Si; we think of these 
two parts as one class. A class might contain only one variable. We call a 
variable fixed if it belongs either to Si or to the mirror of Si. Other variables 
are called free. Similarly, except Si which is fixed, all other classes are free. A 
variable is fixed if and only if the GE3 refutation system can derive a clause 
containing only this variable (equal to a constant). Two free variables belong 
to the same class if and only if the GE3 system can derive a clause containing 
only these two variables. 

Each original clause of (p is of one of the following types: 

1. It contains three free variables, each of them has distinct equivalence class. 

2. It contains one fixed variable and two free variables from the same equiv- 
alence class. 

3. It contains three fixed variables. 

We now explain why the above three types cover all clauses. If a clause has 

no fixed variable then its variables must be from distinct classes (type 1), as 
otherwise two of them will cancel out and cause the other variable to be fixed. 
If a clause has exactly one fixed variable then the other two belong to the same 
class and they arc free (type 2). A clause cannot have exactly two fixed variables 
as the remaining variable will be also fixed (thus the remaining case is type 3). 

We will now give values to the vectors corresponding to all clauses. These 
vectors will satisfy the SDP constraints and will also give a value of m. An 
assignment for a clause that contradicts the information gathered by GE3 is 
called illegal; otherwise it is legal. For example, for the equivalence classes 
given above, an assignment such as xn = 1 is illegal because it contradicts Si, 
also an assignment such as (xi, xg, xn) = (1, 1, 0) is illegal because it contradicts 
Si. We will use the following guidelines: 

• Each vector has I coordinates, numbered from to Z — 1. For 1 < i < l — l, 
coordinate i will correspond to free class i 
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• A clause vector that corresponds to an illegal assignment will be set to 
the zero vector 0. For a clause of type (1) the clause cloud will have four 
assignments with non zero vectors, for a clause of type (2) there will be 
two assignments, and for a clause of type (3) there will be one assignment. 

• Let c be a clause that has i different free classes {i € {0,1,3}). The 
vectors corresponding to legal assignments of c will have exactly 1 + i non 
zero entries. The only non-zero coordinates are and the coordinates 
corresponding to the indices of the free classes. 

Notice that the second bullet can be interpreted as removing from G^"'" all 
the vertices corresponding to illegal assignments. Thus from now we will assume 
that such vertices are indeed removed from G^"''. To simplify the notation in 
the remainder of the proof, we do the following. With each subclass Si we 
associate a literal Si (and with Si we associate Si). We translate each clause 
c = {xg,X2,X5) into a new clause c = (si, 59,32) by replacing each literal Xi of 
c with the literal corresponding the unique subclass which contains Xi. Note 
that the subclass literal replacing the literal Xi may have polarity opposite to 
Xi (if for example Xi G Si). The new induced formula (f> may contain some 
clauses with multiplicity > 1 as well as clauses in which some variable appears 
more than once (e.g. {si,Si,Ss))- We will now define a homomorphism / 
from G^"'' to G|°^ which imphes that ^(G^"'') > ^(G|°'') (a homomorphism 
f : G ^ H maps the vertices of G into the vertices of H while preserving the 
edge relation, i.e. if {u,v) € E{G) then {f{u),f{v)) S E{H)). Recall that each 
clause c = (ig, 0:2, 2:5) of </> has a unique corresponding clause c = (si, sg, S2) of 
4> (although other copies of (si,sg,S2) may exist in cj)). The map / is defined 
only for legal satisfying assignments of cf) (we already removed from G^""^ all the 
non legal assignments). / maps the vertices (assignments) in the clause cloud 
of c to vertices (assignments) in the clause cloud of c as follows: 
for a legal satisfying assignment of c, say (.xg, 2:2, .T5) = (1,1,1), we replace 
each literal Xi with its corresponding class literal and leave the values as is. For 
example if xg € Si,X2 S 6*9,. T5 e 5*2 then / maps the assignment {xg,X2,X5) = 
(1,1,1) into (si, 59,32) = (1,1,1). It is not hard to see that / maps a legal 
satisfying assignment for c into an assignment for c that is both satisfying and 
noncontradictory (meaning for example that it will not result in one occurrence 
of si being set to and the other being set to 1). The assignment / returns 
must be non contradictory as otherwise 4> can be refuted by GE3. Note that 
GE3 can not refute (j) nor can it derive an equation like = Sj, for i ^ j. From 
here on we show a SDP solution to G^"^. 

The vector vq is set to be (1,0, ... ,0). The remaining vector assignments 
are as follows, divided by the clause types: 

1. Type (1), three free distinct classes. Assume the clause is c = (si, S2, 54)- 
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The vector assignments will be: 



(si,S2,S4) 


'=(1,1,1) 


= (V4, 


-1/4, 


V4, 


0, 


V4, 


0, . 


..,0) 




' = (1,0,0) 


= (V4, 


-1/4, 


-1/4, 


0, 


-1/4, 


0, . 


..,0) 


^(si, 82,54) 


' = (0,1,0) 


= (V4, 


V4, 


V4, 


0, 


-1/4, 


0, . 


..,0) 


'^(si, 82,84) 


' = (0,0,1) 


= (V4, 


V4, 


-1/4, 


0, 


V4, 


0, . 


..,0) 



2. Type (2), one fixed class and two occurrences of some free class. Hence the 
equation has exactly two satisfying assignments. One assignment would 
get a vector that has 1/2 in its coordinate and 1/2 on the coordinate 
corresponding to the free class, and the other would get a vector that has 
1/2 in its coordinate and —1/2 on the coordinate corresponding to the 
free class. For example, for the clause c = {si, 82,82) the vectors would 
be: 

4,.^2,.2) = (l,l,l) = (V2, 0, 1/2, 0, ...,0) 

4.^2,82)=(i.o,o) = (V2, 0, -1/2, 0, ...,0) 

3. Type (3), three fixed classes. Assume c = (s;, s/, si). In this case the only 
non-zero vector is: 

^'f8,,8-,,8-,)=(i,o,o) = (1,0,0,0,...,0) 

We next show that the above vector configuration is a valid solution of the 
•d function of G^°^ (it is easy to sec that the above solution has value of m). 

Constraints of type JSJ hold because of the special form of non-zero vectors. 
The fact that constraints of type (j3Jl hold will be implicit in our proof that 
constraints of type Q hold, and is omitted. Hence we will only consider now 
constraints of type Q . 

Observe first that within every clause cloud constraints of type (@J) hold. 
Hence it remains to check (0J for pairs of different clauses that have an s variable 
in common. Let Ci, C2 be two clauses that intersect. We continue by case analysis 
according to the number of distinct s variables shared by ci, C2. 

1. Three distinct variables are shared: since GE3 did not deduce = 1 the 
clauses are identical and Q (and hold from the fact that it holds for 
each cloud separately. 

2. Two distinct variables are shared: using GE3 we deduce that also the 
third variable is shared and this case was already handled. 

3. Exactly one variables is shared: for simplicity, assume that each of the 
clauses contain 3 different variables and say is the shared variable. 
The only two indices that contribute to the inner product sum are and 
(possibly) i. If Sj is fixed the assignments cannot be contradictory and the 
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sum is strictly positive (only coordinate contribute to the sum). Assume 
that Si is free. Consider the case in which in each clause the other two 
literals are also free. If the vectors are of contradicting assignments the 
sum will be (l/4)(l/4) + (-l/4)(l/4) (or (l/4)(l/4) + (l/4)(-l/4)). If the 
vectors are not of contradicting assignments, the sum is strictly positive. 

Note that also in the other cases where one of the clauses contains only 
one or two different s variables, a similar argument works. 

□ 

Proof of Lemma \2.b\ We follow the line of proof given at ^ with some simpli- 
fications that can be applied in our case. We use the following definitions from 
Let A, B be any two formulas. A\= B \i every satisfying assignment for A is 
a satisfying assignment for _B, or equivalently, every non-satisfying assignment 
of B is also a non-satisfying assignment of A. Let be a formula (collection 
of clauses) and let C be any clause. We use n^piC) to denote the minimum 

size subformula of that implies C, i.e. ^i(j,(C) = min^/c^ \{4>' \= C}\. As 4> is 
known from the context (and fixed) we use /i(C) instead of ^i^{C). The function 
is sub-additive, meaning that ii A, B \= C then /i(C) < /i(^) + m(B)- We use 
to denote a contradiction (the empty clause). 

A simple counting argument shows that any subformula of cf) of size smaller 
than k = ^ is satisfiable; see Lemma ETI Thus, /i(0) > k. From the 

sub-additivity of /i, it follows that any GE3 proof of contains some clause C 
for which | < /i(C) < ^ (the explanation is as follows. The derivation of 
can be described by a tree in which every leaf has a label that equals to some 
clause of (j) and the root has a label that equals 0. For each leaf label, say A, it 
holds that n{A) = 1 and for the root label it holds /x(0) > k). In other words, 
the minimal subformula E' that implies C is of size in [|, The subformula 
E' (as any other subformula of (p, whose size in [|, see Lemma \n} has at 
least 4 special variables, each of them appears in exactly one clause of E' . We 
show in the next paragraph that each of these 4 special variables must be in C. 
This implies that C cannot be derived in GE3, contradicting the assumption 
that GE3 refutes (f>. 

Let a; be a special variable that belongs to some clause / of E' (and not to 
any other clause in £"). From the minimality of E' , there exists an assignment 
a such that /(a) = C{a) = but for any other clause g e E' it holds that 
g{a) = 1 (as otherwise E' \ {/} |= C). By contradiction, assume that x ^ C. 
Changing the value of a only on x leaves C unsatisfied. Yet, / becomes satisfied 
and any other clause of E' remains satisfied because x appears only on /. We 
deduce that after changing a only on x the subformula E' becomes satisfied 
while C is not, this is a contradiction to E' \= C . 

□ 

/ 3 22 log log n N 

Lemma 4.1. Let (j) be a random formula with m ~ o In^ '"s" 1 clauses. 
^ ~ 4 loglog n - W^*^^ probability the following properties hold. 
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1. Any suhformula of (j) of size k is satisfiable. 

2. Any subformula E' C (j), whose size is in [|, has at least 4 variables, 
each of them belongs to exactly one clause of E' . 

Proof. We show that any small subformula (j)' is satisfiable by showing that 
in any such small subformula, the number of variables is at least the number 
of clauses. By Hall's marriage theorem, in any such subformula 4>' there is a 
matching from the variables to the clauses that covers all the clauses, which 
implies that (j)' is satisfiable. We now analyse the first event (proving part 1 
of the lemma). Consider k clauses chosen at random. The probability that 
they contain less than k different variables is bounded by the probability of the 
following event: when throwing 3k balls into n bins, the set of non empty bins 
is < k. Thus the probability for the first event is at most 

m\ ^-i f ^\ f i \ ^'' ( \^ ( \^ ^ f k — 1^ 
k)f-l\i)\n) -^VXJ \k - 1 



„e2fc-i(fc-l) 
< 2 i - 



(the first inequality is because the sum is geometric with ratio > the last 
inequality holds for k = j^^)- 

We now bound the probability of the second event (part 2 of the lemma). 
Fix I to be in the interval [|, ^]. Consider I clauses chosen at random. The 
probability that they contain less than 4 special variables equals the probability 
of the following event. When throwing I triplets of balls into n bins (where each 
triplet of balls choose three different bins) there are less than 4 bins that contain 
exactly one ball. Notice that if the balls fall into more than 3(Z + l)/2 bins, 
there must be at least 4 bins that contain exactly one ball. The probability is 
thus bounded by 




To cover all possible values of Z G [f , ^] we multiply the last term by k. The 

( 3 22 log log n \ 
n 2 log 1 j . 

□ 

Proof of Proposition \2.!A A simple probabilistic argument shows that if c is 
large enough, is likely to contains four clauses of the following form (see 
Lemma IT^ : 

ci = (xi,X2,a;3) C3 = (a;5, 0:6,0:3) 

C2 = (0:1,2:2,0:4) C4 = (0:5, 0:6,^4) 
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The above four clauses are contradictory (summing all of them give 1=0 
modulus 2). 

The 1? function of the graph induced only by these 4 clauses has a value 

of PS 3.4142 < 4. This bound was experimentally derived by running a semi- 
definite programming package on Matlab. The adjacency matrix we used is: 
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vertices 1, 2, 3, 4 correspond to ci ,vertices 5, 6, 7, 8 correspond to clause C2, ver- 
tices 9, 10, 11, 12 correspond to clause C3 and vertices 13, 14, 15, 16 correspond 
to clause C4: 





X\ 








Xl 


X2 


X4 


Vl 


1 


1 


1 




1 


1 


1 


V2 





1 










1 





V3 


1 










1 








V4 








1 


Vs. 








1 






X(, 






X5 


X6 


X4 


V9 


1 


1 


1 


Vl3 


1 


1 


1 


Vio 





1 





Vi4 





1 





vn 


1 










1 








V12 








1 


Vie 








1 



The -d function of must be smaller than < m as the remaining graph 
(without the clouds of ci, C2, C3, C4) can be covered by m — 4 cliques. 

□ 
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Lemma 4.2. Let (f) he a random formula with n variables and m = cr?!"^ 
random clauses. Almost surely (f> contains four clauses of the form: 



ci = {xi,X2,xz) C3 = (a;5,a;6,X3) 

C2 = {xi,X2,Xi) Ci = [x^.Xa.Xi) 

Proof. We say that a{n) ^ 6(71) if lini„^oo = 1. A pair of clauses is said to 
match if the two clauses share the same first and second literal. The expected 
number of matched pairs in (f> is 

m\ 1 1 c^n"^ 1 c^n 

2 

Furthermore, it can be shown that w.h.p. <j) contains ~ matched pairs such 
that each clause of participates in at most one pair of matching clauses (a 

standard use of the second moment, see for example [7] for a proof). Assume we 

2 

have matched pairs. For any such pair the third literal in each of them is 

still random. Fix two matched pairs ci,C2 and 23,04. With probability ~ 
the third literal of ci and C3 is the same and the third literal of C2 is opposite 
from the third literal of C4. It thus follows that the expected number of two 
pairs of the form 

ci = (xi,a;2,X3) C3 = (x5,a;6,a;3) 

C2 = (xi,a;2,X4) C4 = ix5,XG,X4), 



IS 



(11) 



Using standard techniques (such as the second moment), it can be shown that 
almost surely (f> contains four clauses of this form. Details are omitted. □ 
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